|
The Cray-2 was a supercomputer with four vector processors built with emitter-coupled logic and made by Cray Research starting in 1985. At 1.9 GFLOPS peak performance, it was the fastest machine in the world when it was released, replacing the Cray X-MP in that spot. The Cray-2 was replaced as the world's fastest computer by the ETA-10G in 1990. == Initial design == With the successful launch of his famed Cray-1, Seymour Cray turned to the design of its successor. By 1979 he had become fed up with management interruptions in what was now a large company, and as he had done in the past, decided to resign his management post and move to form a new lab. As with his original move to Chippewa Falls, Wisconsin from Control Data HQ in Minneapolis, Minnesota, Cray management understood his needs and supported his move to a new lab in Boulder, Colorado. Working as an independent consultant at these new Cray Labs, he put together a team and started on a completely new design. This Lab would later close, and a decade later a new facility in Colorado Springs would open. Cray had previously attacked the problem of increased speed with three simultaneous advances: more functional units to give the system higher parallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. The classic example of this design is the CDC 8600, which packed four CDC 7600-like machines based on ECL logic into a 1 x 1 meter cylinder and ran them at an 8 ns cycle speed (125 MHz). Unfortunately the density needed to achieve this cycle time led to the machine's downfall. The circuit boards inside were densely packed, and since even a single malfunctioning transistor would cause an entire module to fail, packing more of them onto the cards greatly increased the chance of failure. One solution to this problem, one that most computer vendors had already moved to, was to use integrated circuits (ICs) instead of individual components. Each IC included a selection of components from a module pre-wired into a circuit by the automated construction process. If an IC did not work, another one would be tried. At the time the 8600 was being designed the simple MOSFET-based technology did not offer the speed Cray needed. Relentless improvements changed things by the mid-1970s, however, and the Cray-1 had been able to use newer ICs and still run at a respectable 12.5 ns (80 MHz). In fact, the Cray-1 was actually somewhat faster than the 8600 because it packed considerably more logic into the system due to the IC's small size. Although IC design continued to improve, the physical size of the ICs was constrained largely by mechanical limits; the resulting component had to be large enough to solder into a system. Dramatic improvements in density were possible, as the rapid improvement in microprocessor design was showing, but for the type of ICs used by Cray, ones representing a very small part of a complete circuit, the design had plateaued. In order to gain another 10-fold increase in performance over the Cray-1, the goal Cray aimed for, the machine would have to grow more complex. So once again he turned to an 8600-like solution, doubling the clock speed through increased density, adding more of these smaller processors into the basic system, and then attempting to deal with the problem of getting heat out of the machine. Another design problem was the increasing performance gap between the processor and main memory. In the era of the CDC 6600 memory ran at the same speed as the processor, and the main problem was feeding data into it. Cray solved this by adding ten smaller computers to the system, allowing them to deal with the slower external storage (disks and tapes) and "squirt" data into memory when the main processor was busy. This solution no longer offered any advantages; memory was large enough that entire data sets could be read into it, but the processors ran so much faster than memory that they would often spend long times waiting for data to arrive. Adding four processors simply made this problem worse. To avoid this problem the new design banked memory and two sets of registers (the B- and T-registers) were replaced with a 16 KWord block of the very fastest memory possible called a ''Local Memory,'' not a cache, attaching the four ''background processors'' to it with separate high-speed pipes. This Local Memory was fed data by a dedicated ''foreground processor'' which was in turn attached to the main memory through a Gbit/s channel per CPU; X-MPs by contrast had 3, for 2 simultaneous loads and a store and Y-MP/C-90s had 5 channels to avoid the von Neumann bottleneck. It was the foreground processor's task to "run" the computer, handling storage and making efficient use of the multiple channels into main memory. It drove the background processors by passing in the instructions they should run via eight 16 word buffers, instead of tying up the existing cache pipes to the background processors. Modern CPUs use a variety of this design as well, although the foreground processor is now referred to as the ''load/store unit'' and is not a complete machine unto its own. Main memory banks were arranged in quadrants to be accessed at the same time, allowing programmers to scatter their data across memory to gain higher parallelism. The downside to this approach is that the cost of setting up the ''scatter/gather unit'' in the foreground processor was fairly high. Stride conflicts corresponding to the number of memory banks suffered a performance penalty (latency) as occasionally happened in power-of-2 FFT-based algorithms. As the Cray 2 had a much larger memory than Cray 1's or X-MPs, this problem was easily rectified by adding an extra unused element to an array to spread the work out. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Cray-2」の詳細全文を読む スポンサード リンク
|